BBC RD 1979/26 
RESEARCH DEPARTMENT REPORT 



COPiS-ft higl speei real-time 
iigital audio processor 

G.W. McNaMy, B.Sc, C.Eng., MIEE 



Research Department, Engineering Division 

THE BRITISH BROADCASTING CORPORATION November 1979 



BBC RD 1979/26 

UDC 621.391: 
681.3.06 



COPAS - A HIGH SPEED REAL-TIME DIGITAL AUDIO PROCESSOR 
G.W. McNally, B.Sc, C.Eng., MIEE 



Summary 

The availability of computing power at decreasing cost has made 
real-time digital signal processing effective in areas as diverse as speech pro- 
cessing, seismic studies, radar and sonar. The purpose of the work described 
in this Report was to investigate the effectiveness of such processing when 
applied to high quality digital audio signals; accordingly a special purpose 
processor was designed and constructed. It is known as COPAS: a Com- 
puter for Processing Audio Signals. 

The requirements of the processor were that it should be efficient 
at executing signal processing algorithms; that it should be programmable 
for flexibility; and that it should be supported by a range of both hardware 
and software aids to simplify fault finding and program development. Also, 
as a result of previous work using a commercially available computer, an 
improvement in processing power of at least a factor of three was desired 
when compared with the earlier equipment. 

The COPAS processor met these objectives using a pipelined, micro- 
programmed architecture. The design was realised on a single printed circuit 
board and utilised two separate processors, one for the high speed signal 
processing and the other for slower operations. Communication between 
the two processors was arranged so that it incurred no processing penalty on 
the high speed processor, so ensuring that performance was maximised. The 
slower microprocessor performs all 'housekeeping' tasks for the system and 
supports a range of peripherals including a terminal through which the 
system is controlled. The input/output and control circuits for the pro- 
cessor were designed so that several identical processors could be connected 
together to form an array processor for large processing tasks. 

The development of programs was assisted by an assembler and 
several programs were successfully run on COPAS and its performance was 
confirmed. 



Issued under the authority of 



//^^J>».-^*^'^*«^ 



Research Department, Engineering Division, 

BRITISH BROADCASTING CORPORATION Head of Research Department 

November 1979 
(PH-212) 



COPAS - A HIGH SPEED REAL-TIME DIGITAL AUDIO PROCESSOR 

Section Title Page 

Summary Title Page 

1 Introductiori 1 

2 Glossary of terms 1 

3. Objectives and experience 2 

4. Operations required of the digital audio processor 2 

5. Architectural factors 3 

6. Architectural features of COPAS 5 

6.1 . Choice of processor 5 

6.2. Internal bus layout 8 

6.3. Pipelining 9 

6.4. Support processor 9 

6.4.1 . User interface 9 

6.4.2. Timing generation 11 

6.4.3. Dynamic data exchange (DDE) 11 

6.4.4. Array processing 12 

6.4.5. Structure of the support processor 12 

6.5. Program sequencing 12 

6.6. Input/output 15 

6.7. Scratchpad memory . 15 

6.8. Construction 15 

7. Programming 17 

7.1. Instruction format 17 

7.2. Mnemonic description 18 

7.3. The cross-assembler 18 

7.4. Program locking 19 

7.5. Operating system description 19 

7.6. Program example 20 

7.6.1. Data declaration area 20 

7.6.2. Program statement area . 21 

7.6.3. The reset code ~. 21 

7.6.4. Initialisation 22 

7.6.5. Execution of the biquadratic algorithm 22 

7.6.6. Program locking code .............................................. 22 



(PH-212) 



Section Title Page 

8. Discussion of performance and applications 22 

9. Conclusions 24 

10. References 24 



(PH-212) 



COPAS - A HIGH SPEED REAL-TIME DIGITAL AUDIO PROCESSOR 
G.W. McNally, B.Sc, C.Eng., MIEE 



1. Introduction 

Digital signal processing is an increasingly im- 
portant technology. The availability of computing 
power at ever decreasing cost has made real-time 
digital signal processing an effective solution in 
areas as diverse as speech processing, seismic 
studies, and radar or sonar detection. 

In many appHcations, the power of the real- 
time processor is used to pre-process data, reducing 
the amount of data collected before analysis on a 
conventional computer in the usual way. For 
high quality audio signal processing, many appli- 
cations will require that the entire process is 
executed in real-time. The common feature of 
all these systems is that processors are required 
which can perform complicated signal processing 
algorithms at very high speeds. However, these 
diverse applications have resulted in a variety of 
architectures being proposed, working at various 
levels of performance. 

Traditionally, processors have been hard- 
wired, as such an arrangement can usually be 
made to work at a higher speed than a programm- 
able machine. However, recent advances in micro- 
processing permit complex but programmable 
architectures to be designed which have high 
execution rates. Such a machine will have flexi- 
bility in every aspect compared with a hard-wired 
processor and the extra design work involved 
can be amortized over many applications. 

In this report, the design principles for a 
digital processor operating on high quality digital 
audio signals are discussed ; the processor is known 
as COPAS, denoting COmputer for Processing 
Audio Signals. The processor incorporates the 
advantages of LSI technology so that the whole 
machine is constructed on a single printed circuit 
card carrying about 100 chips. It is capable of 
a cycle time of 135 ns using a pipelined, micro- 
programmed architecture. Program development 
by a user is backed by a comprehensive support 
at both the hardware and the software level. A 
resident operating system permits the user to load, 
edit and monitor the program execution in the 
processor, and a cross-assembler has been written 
so that programs can be prepared in mnemonic 
language. A set of mnemonics has been defined 
and the microcode output from the assembler 
can be loaded directly into COPAS. 



The prototype COPAS has over 40% of its 
hardware committed to simplifying the user 
interface and adding facilities over and above 
its primary purpose of signal processing. Several 
cards can be connected together to form a multi- 
processor system, with all control information 
sourced via an IEEE 488 standard bus. 

The machine has been used successfully in 
experiments in digital filtering, reverberation and 
signal synthesis. The development of software 
for COPAS is an area of considerable work and is 
on such a scale that it cannot be included in this 
Report, but will be the subject of a future Report. 
The architecture of the processor is very strongly 
influenced by the software implementation of 
digital signal processing algorithm and experience 
gained in previous work has given an enhanced 
perspective to the advantages and disadvantages 
of a particular architecture. 



2. Glossary of terms 

The juxtaposition of computer 'jargon' and 
audio signal processing terms may lead to some 
confusion. This section is therefore included 
for those readers not familiar with some of the 
terms used in this Report. The reader is referred 
to Reference 1 for a treatise on the principles 
underlying the choice of the terminology. 

Glossary:— 

1. ARITHMETIC LOGIC UNIT (ALU) - A 
hardware unit which carries out operations 
on data under the control of the current 
microinstruction (e.g. add, shift, logical 
NAND, etc.) 

2. ASSEMBLER - A computer program that 
maps a series of mnemonic instructions (a 
source program) into a set of machine instruc- 
tions or microinstructions (an object pro- 
gram). When this program is executed on 
a different machine to the one for which the 
object code is destined, it is termed a cross- 
assembler. A microprogram assembler is 
sometimes termed a meta-assembler. 

3. ARCHITECTURE - aspects of the computer 
structure that are visible to the programmer, 
e.g. instruction set, registers, etc. 



(PH-212) 



4. FIRMWARE — microprograms resident in the 
computer's control memory; usually used 
in the context of read-only memory (ROM) 
or programmable ROM (PROM) micro- 
programs. 

5. MACHINE INSTRUCTION - A bit pattern 
that is interpreted by hardware into one or 
more microinstructions for execution. 

6. MICROINSTRUCTION - A bit pattern 
which directly controls the processor hard- 
ware during one or more machine cycles. 
A microinstruction specifies one or more 
micro-operations. 

7. MICROOPERATION - an elementary opera- 
tion such as register to register transfer. 

8. MICROPROGRAM MEMORY - A store 
which holds a series of microinstructions 
which constitute the stored program. Also 
known as a control store. 

9. OPERATING SYSTEM - A program which 
carries out the 'housekeeping' of a computer 
and controls input/output devices. 

10. PIPELINE — A register which contains the 
microinstruction currently being executed 
by the machine. 

11. PROGRAM SEQUENCER - A hardware 
unit which determined the order of execution 
of microinstructions. 

12. RANDOM-LOGIC - refers to the pattern of 
gates and interconnections resulting from the 
often ad hoc nature of the design and layout. 



3. Objectives and experience 

Digital techniques made their initial impact on 
high quality audio in areas where analogue tech- 
niques could no longer satisfy requirements. 
After the successful transmission of digitised audio, 
digital special effects units and magnetic tape 
recorders were developed and so it was a logical 
step to provide digital signal processing to accom- 
pany these recorders. 

Initial experience was gained in this area by 
using a commercial high speed computer and 
although the performance of the system that was 
constructed was somewhat limited, a number 
of useful investigations were carried out.'^ Know- 
ledge of digital signal processing algorithms was 



accumulated and an estimate was made of the 
processing power required to further the work. 
The use of a programmable machine was shown 
to have great advantages, and the equipment was 
inevitably used in applications that had not been 
originally foreseen, for example, in comparing 
companding systems for digital audio. "^ 

Stimulated by the results of this work, it 
was decided to develop a purpose-built processor 
for digital audio. The emphasis of the original 
work had been on executing those functions 
commonly found in mixing desks by digital means, 
and it was found that to execute the functions 
contained in just one channel of a mixer would 
have exceeded the capabilities of the computer 
used by a factor of three. Additionally, the 
computer used (a Plessey Miproc), was bulky 
and expensive and was not designed with audio 
applications in mind. 

The aim was therefore to design a programm- 
able processor, ideally on a single card, specifically 
for audio processing, but with a sufficiently 
flexible structure that a wide range of experi- 
ments could be performed. 

The processor should be capable of expansion 
and should also be configured so that a 'minimum 
system' could later be derived to suit a particular 
application. Added to this were the usual require- 
ments for minimum cost, size and power con- 
sumption combined with the highest possible 
speed. It was with this goal that COPAS was 
designed. 



4. Operations required of the digital audio pro- 
cessor 

A great deal has been vwitten on signal pro- 
cessing and digital signal processing. In general, 
filter theory and spectrum analysis predominate, 
and processing in the frequency domain is reserved 
for those applications where large blocks of data 
are to be analysed. 

Consider some of the algorithms that may 
be required of a processor operating on a sampled 

signal he A, where k = nT, n = 0,1,2 n and 

T = sample period. 

1. Finite Impulse Response (FIR) filter 



Af-I 

^ — ^ 

;=0 



(PH-212) 



where the ifeth output is the weighted sum of 
the M past inputs. 

2. Infinite Impulse Response (IIR) filter 



i=0 7=1 

where the ^th output is now the weighted 
sum of M past inputs and also N past outputs. 

3. Correlation may be required of, for example, 

N 

yk= Tj fi*k Si 

i=-N 

4. The link between the time and frequency 
domain is commonly achieved by using the 
Discrete Fourier Transform. 

^r = E '"^k expilwjrk/N) 
k = o 



where ;' = \/ —I 

N = number of samples 
r = harmonic number = 0,1,2 .... ,N—1 
k = sample number = 0,1,2 .... , N—1 
Af = rth coefficient of the Fourier trans- 
form. 

It can be seen that the prevalent operations are 
multiplication (possibly complex) and summation. 
Also the execution of these algorithms is data 
independent and so no branching from the normal 
program flow is required. 

There are, however, other applications where 
the processing is data dependent. For example, 
in audio companding a non-linear transfer function 
must be realised and this can be achieved only by 
performing tests on the sampled signal. Further, 
the storage requirements for microprogram and 
the ease of writing programs is greatly improved 
with the ability to perform microsubroutines. 
However, each of these operations requires con- 
ditional branching and therefore some non-trivial 
program sequencing logic. 

A significant problem in the processing of 
data is that of memory addressing. Consider 
an FIR filter in which the data required for the 
calculation is symmetrical about the centre of 
the filter. Assuming the data is held in Random 
Access Memory (RAM), a weighted sum of the 



data in the block must be performed by address- 
ing each data in turn. However, at each complete 
calculation of the filter, new data will be entered 
into one location within the block, overwriting 
old data that is no longer required. This means 
that the 'centre' of the filter has moved in memory 
addressing terms, and new address patterns have 
to be generated in subsequent executions of the 
filter program. If the block is of size N, then 
the addresses have to be incremented modulo 
N. It can therefore be seen that the workload 
involved in merely addressing data can be con- 
siderable and can be even more severe in, for 
example, the execution of an FTT algorithm. 
The burden has resulted in some high speed signal 
processors having a quite separate processor 
reserved entirely for generating these complex 
address schemes. 

A further contribution to the addressing 
problem is that of providing coefficients for these 
algorithms. Although for a particular program 
these will be fixed, it is necessary to have a flexible 
means of loading and changing these coefficients 
so that the parameters remain under the control 
of the programmer. In particular, it may be a 
requirement to change them without interrupt- 
ing the signal processing operations so that there 
is no objectionable disturbance in the output 
data. 

Input and output ports have to accept data 
at the real-time data rate and in audio processing 
it is common for several outputs to be generated 
from a single input. These I/O ports should be 
designed so that several processors can be con- 
nected together so that the computational pay- 
load can be evenly distributed. This implies that 
real-time data can be transmitted between units 
along a data highway. Similarly the control of 
these processors should be under the control of 
an executive computer with commands issued via 
a single control bus. 

As a general purpose machine, the architec- 
ture should be such that changes can be made 
easily. A common requirement would be to 
extend the memory range or add extra input/ 
output facilities. Also, with technology changing 
so quickly, it is worth keeping in mind that new 
processing elements, for example, may offer 
enhanced performance quickly and easily if they 
can be incorporated into the overall system archi- 
tecture. 

5. Architectural factors 

For a signal processor to work in real-time, a 



{PH-212) 



very high computation rate is required. By careful 
design, a processing speed-up of 100-to-l can be 
achieved relative to the processing in most general 
purpose machines.'* The specification of a com- 
puter which satisfies the particular speed, cost, 
size, power and other required specifications of a 
given application can be examined in several 
areas:— 

(a) Technology — the requirement for a small 
low-cost processor operating at high speed restricts 
the choice of technology to TTL or ECL. With 
the recent availability of Schottky TTL micro- 
processors and a range of compatible memory and 
multiplier circuits at LSI gate densities, TTL is 
very attractive to the designer. However, the 
complexity of ECL devices is steadily increasing, 
and ECL microprocessors are under development, 
so the choice may not be so clear in the future. 
With multiplication as the dominant factor in 
signal processing, the emergence of single chip 
16 X 16 multipliers is a significant influence. 

(b) Pipelining and Parallelism — In pipelining, a 
process is spHt into several tasks, each task being 
handled by a reserved piece of programmable 
hardware. Thus execution proceeds assembly-line 
style and many different processes are accom- 
plished at the same time. A good example of pipe- 
lining is common to many processors — the ability 
to fetch the next instruction while the current 
instruction is being executed.^ However the 
method can be extended to the arithmetic unit 
where a task is broken down into a number of 
sequential operations, each stage of the pipeline 
has its own associated program, and each stage 
performs one operation per machine cycle.^ 
The desire to increase the processing during each 
cycle can also be achieved by parallelism, i.e. 
replicating hardware with identical functions. 
For example, in a second order filter section 
a processor could have five identical multipliers 
so that all multiplications can occur in the same 
machine cycle. With both these methods there 
is a space/time tradeoff, i.e. speed can be improved 
at the expense of extra hardware.' 

(c) Multiprocessing — The method of replicating 
hardware within a processor can be extended to 
replicating the entire processor itself. In multi- 
processing, each processor has its own program 
and performs just one part of the overall process. 
This can take the form of specialised modules 
for performing certain algorithms which in turn 
are linked with other processors to handle memory 
addressing and data input/output.^ These tech- 
niques clearly offer advantages for high speed 
operation but complicate programming greatly. 



Also, it is difficult to ensure that each processor 
keeps busy or conversely avoids data 'bottlenecks'. 

(d) Programming — The advantages of using a 
programmable signal processor rather than special 
purpose hardware are the lower cost of a general 
purpose architecture and the flexibility provided 
by the ability to change the program. The 
emphasis on speed and efficiency has naturally 
led away from the development of high-level 
languages for signal processing. Microprogramm- 
ing as a means for controlling circuits within 
digital computers was first proposed by Wilkes in 
1951.^ In his original scheme, control waveforms 
were directly generated from a read-only memory 
(ROM). With modern semi-conductor storage 
this concept has now been refined and developed 
into a powerful design technique. 

Microprogrammed machines are distinguished 
from non-microprogrammed machines in the 
following way. In a non-microprogrammed 

machine the control function is implemented by 
'random logic' — arrays of gates and flip-flops 
connected in a dedicated way to produce the 
required timing and control signals. A micro- 
programmed machine is in comparison, a highly 
ordered machine in which a microinstruction 
determines the entire function of the mahcine 
during a clock cycle. In microprogramming the 
requirements of speed and flexibility can be 
brought together. 

It is worth distinguishing at this point be- 
tween the microprogramming technique that can 
be used in a signal processor and that used in a 
general purpose computer. In a general purpose 
computer a sequence of microinstructions is used 
to execute a machine instruction, and, as far as 
the programmer is concerned, the way in which 
the machine instruction is executed is of little 
concern. However for a very high speed machine, 
the translation process from a machine instruction 
to a sequence of microinstructions can be bypassed 
if the programmer is prepared to write programs 
directly at the lower degree of sophistication of the 
microinstruction. This will give economies of 
speed and complexity of hardware and takes 
away the limitations of a fixed machine-instruction 
set. However, the programmer will have to work 
with a more unwieldly language and will require 
a more intimate knowledge of the hardware 
architecture to avoid generating instructions which 
are intrinsically invalid. 

There are two approaches to achieving a given 
performance in a microprogrammed machine. The 
first is to use wide microinstructions, i.e. having 



(PH-212) 



-4 



many bits, so that the hardware can be controlled 
elegantly during each machine cycle. This results 
in complex microinstructions and a possibly 
uneconomic, wide microprogram memory. Alter- 
natively, the system designer can choose to in- 
crease the depth of microprogram, i.e. use more 
instructions, to achieve the same amount of 
processing. An architecture using wide micro- 
instructions is termed a horizontally micropro- 
grammed machine and one which uses many, 
shorter microinstructions, a vertically micro- 
programmed machine. 

In practice, the optimum design may fall 
between these two extremes, though for high 
speed the horizontal approach will be most effec- 
tive. However, there are two useful techniques 
for reducing the number of bits required to imple- 
ment a microinstruction. 

1. Formatting — In a fixed format, each bit 
position in the microinstruction always has the 
same meaning and so each field is used for the 
same purpose in each microinstruction. In a 
variable format, a field will have one function in 
one microinstruction and another function in 
another microinstruction. An example of variable 
formats, sometimes referred to as field overlay 
might be a branch address field which is also used 
for immediate data. The meaning of the field 
would be defined implicitly by another field 
within the microinstruction. 

2. Encoding — In a non-encoded field, each 
bit directly connects to a control line. It is often 
possible to rationalise the system so that only 
valid combinations of control lines are represented 
by bit patterns in the microinstruction. 

A fuller description of microprogramming 
techniques can be obtained from the referen- 
ces.i°'^i'i2 

3 . Types of Arithmetic ~ The most fundamental 
decision that has to be made is whether to use 
fixed point or some kind of floating point repre- 
sentation of numbers. For a fixed word length, 
the floating point representation can handle a 
larger dynamic range than fixed point. Studies 
comparing the round-off noise produced in filters 
by these representations^ ^ show the floating 
point representation to be superior only for filters 
having very high Q factor. The extra dynamic 
range permitted with floating point is needed to 
express accurately the high gains associated with 
these filters. The overall gain of a filter where for 
example a number of second order sections are cas- 
caded, can also be high. However, the problem can 



be evaded by scaling between filter sections, i.e. 
multiplication by a fixed constant, usually a power 
of two.^ '* Scaling can be used to avoid overflow 
and provide maximum precision throughout the 
calculations, combining the advantages of the 
better round-off noise of fixed point arithmetic 
within filter sections and the dynamic range capa- 
bility of floating point between cascaded sections. 

Another decision to be made is how to 
handle negative numbers. The two most popular 
methods are sign + magnitude and two's comple- 
ment. Two's complement notation is easier to 
implement since the rules for addition are the same 
regardless of the sign of the two operands. Also, 
an advantage of two's complement notation is its 
tolerance of overflow. In a sequence of additions 
the partial sums can be permitted to overflow as 
long as the final sum does not overflow. This in- 
creases the effective word length of registers in the 
processor and is an extremely useful feature.''^ 

4. User Interface -- The high speed signal pro- 
cessor on its own will be of little use unless it is 
backed by facilities more usually associated with 
a mini-computer, for example, devices to assist 
programming such as a terminal, paper tape or 
magnetic tape handling facilities and perhaps a 
printer. Bulk storage may be required for data 
used by the processor. From the user's point 
of view the signal processor might even be a special 
purpose peripheral to the host computer that 
he is using.'' ^''' '' This host computer may tackle 
many of the mundane tasks for the signal pro- 
cessor, such as the initialisation on turn-on, or it 
can be used to assemble microcode from higher 
level mnemonic statements; it can even be used 
in a diagnostic role, if suitable points within the 
signal processor are made accessible. 



6. Architectural features of COPAS 

In Section 4, the operations required of a digital 
signal processor were reviewed, and in Section 5, 
some of the architectural features that could be 
incorporated, were outlined. Linking these with 
the decision to make optimum use of LSI tech- 
nology and the requirement for low cost and small 
size, a processor based on bit-slices and a high 
speed multiplier chip was designed. A simplified 
block diagram of the processor is given in Fig. 1 
and the architectural features of this structure 
will now be discussed in detail. 

6.1. Choice of proeessor 

In the search for the highest possible speed. 



(PH-212) 



-'-1 



D I 

sequencer 
> Y 



,-'9 



microprogram 

memory 
(512 X48) 



48 



pipeline register 



"L/ 



34 instruction lines 



> 



status 
register 



<^ 



^6 K ' 



input 
ports 



<; 



-16 



output 
ports 



^ 



iltiply g 



16 



■> 



iO 



BI/0 AI 

t> 16 bit 
'bit slice' 

CPU 
POP AOP 



16,^' 



Iz 



MUX 



10 



^7 <> 



scratchpad 

data memory 

(1KX16) 



Ak iz 



,10 



w 



^ 



microprocessor support system 



/v 



other 
peripherals 



/\ 



N' 



/'v 



■25 



PI 
read 



"^ 



25 



±. 



dig 
cassette 



<? 



•25 



VDU 



P trancelvers (^:^lj]> 
I J ^g 



IEEE 488 
BUS 



(PH-212) 



Fig. 1 - Programmers' block diagram of the COPAS system 

-6- 



some researchers have built their own arithmetic 
processors.'* '^'^^'^^ The hardware for these can 
be very extensive with up to 2500 integrated cir- 
cuits of MECL lOK series for the basic processor 
alone. ^° However, if a limit to the amount of 
processing to be executed is known and can be 
defined, then the processor can be designed within 
that limit. This is atypical of the previous ex- 
amples where the goal is often to perform a partic- 
ular algorithm in the shortest possible time within 
the limits of the technology used and cost. The 
yardsticks used in the design of COPAS were 

1) that it should be at least three times as power- 
ful as the first generation equipment based 



on the Miproc mini-processor. 

2) that it should execute 10 biquadratic sections 
or a 64 stage transversal filter in the period 
corresponding to a sampHng rate of 32 kHz. 

It was recognised that such requirements 
could be met by a processor using bit sHces.^^ '^^ 
These devices have been developed since 1974, in 
order to integrate large blocks of architecture on 
chip so that computers could be built more easily. 
However, working at high speed results in higher 
power dissipation and thus a lower gate density 
on chip, with the result that the number of func- 
tions that could be placed on a single integrated 





HARDWIRED 
ALGORITHMS 



MICRO-DECODE 
LOGIC ARRAY 




AO MUX 
SELECT 



TO PC 
^ TO MC 
^ TOWR 

PGM CTR 
MEM CTR 



-{ 



r. 




^^ LATCH I 




— CIN 

X OUT 

LOG>OUT (MSPI 
Y OUT 
ARITH.>OUT (MSP) 



PROGRAM 
^ COUNTER 



ceo 

OVERFLOW IMSPl 



j^ MEMORY 
^ COUNTER 




ADDRESS OUT 
PORT 



(PH-212) 



Fig. 2 - 74S481 bit-slice architecture 

-7- 



circuit was limited by pin-count considerations. 
Circuits which are designed for a horizontally 
segregated structure (i.e. those in which the control 
path is partitioned) are less thrifty with pin-outs 
than those for a vertically segregated structure (i.e. 
those having a partitioned data path). By maxi- 
mising the internal vertical interconnections at 
the expense of external horizontal interconnec- 
tions a building block can be identified from which 
a central processor of any desired word length 
can be constructed by simply cascading these 
blocks. The results is a high speed, highly inte- 
grated, medium-cost processor. 

The most highly integrated bit-slice available 
during the design period was the Texas 74S481, 
and this was chosen for use in COP AS. It requires 
a minimum of support chips to build a processor 
and, as will be seen in a later Section, also incor- 
porates a rudimentary address processor. Unlike 
other bit-slices, the 74S481 does not contain many 
on-chip registers, but this does not prove to be a 
handicap for signal processing applications, and if 
necessary the designer can add external register 
files. A block diagram of the internal structure 
is reproduced in Fig. 2. As can be seen, the chip 
consists of an arithmetic logic unit (ALU) capable 
of add, subtract, logical and shift operations, 
coupled via the A and B input multiplexers. These 
multiplexers provide elementary logical functions 
as well as source selection for the ALU. Data 
from the ALU is routed via another multiplexer, 
which provides a one-bit left or right shift, to a 
number of destinations, including two indepen- 
dent counters (labelled PC and MC). The out- 
puts from these counters are routed to the AOP 
outputs and can be used to 'point' into memory. 
A total of 17 bits completely determines the 
current instruction being executed, and 24780 
unique operations are available in the bit-slice. 

Four four-bit-slices are connected together, 
giving a 16 bit machine and high-speed of opera- 
tion is maintained by using a fast look-ahead 
carry generator. This is necessary because the 
result of an arithmetic operation is not stable 
until all the carries produced from performing that 
operation in less significant positions have rippled 
through to the most significant position. The 
technique of anticipating the carry so that there 
is no need to wait for a ripple carry to propagate 
through the adder is termed a 'carry look-ahead 
adder'. This technique is applied internally across 
groups of 4 bits within the bit-slice and externally 
across a group of four bit slices using a 748 182 
carry look-ahead chip. 

In such a configuration the 74S481 can be 



programmed to carry out multiply and divide 
operations. However, this is based on an inter- 
nally fixed macroinstruction, i.e. one which auto- 
matically decodes to a sequence of microinstruc- 
tions, and it requires 16 clock cycles to multiply 
two 16 bit numbers. This incurs too great a time 
penalty and, instead, a single chip multiplier 
manufactured by TRW is used which executes 
a 16 X 16 multiply in 160 ns giving a 32 bit pro- 
duct. This chip is interfaced via the bi-directional 
B port of the 74S481 so that multipUcand, multi- 
plier and product are transferred between the ALU 
and the multiplier chip on a single 16 bit bus. The 
multiplier has its own control field in the micro- 
instruction (see Section 7.2). 

6.2. Internal bus layout 

The data paths connecting the individual 
components of the processor are generally shared 
by several elements. This makes the system 
flexible since each component can communicate 
with every other component via the one 'bus'. 
On the other hand this introduces a 'bottleneck' 
since only one device can 'talk' to the bus at any 
time. The bus structure of a processor is there- 
fore critical in determining the speed and effici- 
ency with which a given program can be executed. 
As usual there is a trade-off between speed and 
complexity so that by increasing the number of 
buses that can be talked to by the same device, 
the number of simultaneous data exchanges that 
can be executed is increased. To achieve this, 
however, involves increasing the parts count 
dramatically and poses a big interconnection 
problem. Again, the decision is in the hands of 
the designer and so in COPAS a three bus system 
was chosen, consisting of two data buses and an 
address bus. 

Most data transfers are accomplished using 
the bi-directional B bus which can be sourced by 
scratchpad memory, the high speed multiplier, 
input ports and the registers of the bit-slices. A 
second data bus sourced only by the bit-slice 
DOP outputs, permits data to be routed into the 
scratchpad memory or the output ports without 
tying up the B bus. The third bus is the address 
bus and is arranged so that it can be sourced either 
by the two counters of the bit-slice (via AOP) 
or the operand field of the microinstruction (via 
the pipeline register). This permits a range of 
addressing modes and greatly increases flexibility. 
Restricting the number of buses to three provides 
a sufficiently simple structure for the processor to 
be built on a double-sided printed-circuit board. 
Nevertheless, there is great potential for increasing 
the processing power by modifying and increasing 



(PH-212) 



8- 



next 

address 

control 



branch 
address 



i 



SLJIL 



test 



microprogram 



< 



sequencer 



IZ 



microprogram 
memory 







pipeline register < 



^ 



■clock 



control bits 
for CPU etc. 



Fig. 3 - One level pipeline architecture 

the bus layout of this processor, for example by 
routing coefficients to the multiplier chip on a 
specially reserved bus. However, as will be seen, 
many improvements of this type may restrict 
the general purpose application of the machine. 

6.3. Pipelining 

Pipelining is a very powerful technique for 
increasing the amount of computation completed 
during a machine cycle. COP AS incorporates one 
level of pipeline which gives the first and largest 
performance improvement over conventional archi- 
tectures. Fig. 3 shows a simplified block diagram 
of this implementation which permits fetching of 
the next microinstruction overlapped with the 
execution of the current microinstruction. The 
pipeline register, by definition, holds the micro- 
instruction currently being executed by the 
machine. Simultaneously, while this is being 
executed, the address of the next microinstruction 
is applied to the microprogram memory and the 
contents of that location are read and set up at 
the inputs to the pipeHne register. The perfor- 
mance improvement results because as soon as 
the execution of the current microinstruction has 
been completed, the next microinstruction has 
already been set up at the input to the pipeHne. 
For a machine like COPAS this corresponds to 
nearly halving the cycle time. 

The penalty for this improvement is the extra 
hardware for the pipeline register (a 48 bit register 



in the case of COPAS) and a more complex 
machine to program. This results because at any 
given moment some registers contain the results of 
the previous microinstruction executed (e.g. the 
status register), some registers contain the current 
microinstruction being executed (e.g. the pipeline 
register), and some registers contain data for the 
next microinstruction to be executed (e.g. the 
branch address field of the microinstruction). 

6.4. Support processor 

As described so far, the COPAS processor is 
a high speed, pipelined, microprogrammable 
machine. For it to be of any use it must be 
interfaced v/ith the user/programmer and this can 
be conveniently achieved via another computer, 
programmed at a higher level. The COPAS design 
uses a microprocessor (MPU) to implement this 
interface, but it also uses the same MPU to gener- 
ate many of the timing waveforms for the whole 
machine. More importantly, it assists the bit- 
slice processor in signal processing operations and 
provides the means for assembling many COPAS 
units into an array processor with considerably 
enhanced processing abilities. These four distinct 
areas can now be examined in turn. 

6.4.1. User Interface 

The COPAS processor was designed to be 
part of a completely self-contained audio signal 
processing station. As such, it is important that 
the user can communicate effectively for the 
purposes of loading, editing and monitoring the 
operation of programs. COPAS can interface 
with several peripherals, namely a terminal com- 
prising a video-display unit and keyboard, a digital 
cassette unit, and a paper tape reader. All system 
commands are supplies from the keyboard of the 
terminal and are recognised by the operating 
system so that the required action is taken. The 
interface with the terminal is via an EIA standard 
RS232C serial link operating at 1200 baud. 

A high speed paper tape reader (500 charac- 
ters/s) can also be connected to the processor. 
The cross-assembler for COPAS currently runs on 
a time-shared computer bureau and the most con- 
venient way of transferring program to COPAS is 
by paper tape. The support processor itself can 
also be loaded with test programs and data by 
these means. 

The digital cassette unit is provided for long 
term storage of programs. The operating system 
has facilities for writing, reading and locating 
programs on the tape and also allows the tape 



(PH-212) 



f2 



o 

S 






t3 
V 

.ti 

3 

V 

5-1 

q 
o 



o 

•43 
u 

.S 



It 






o 



Pi 

.s 



a' 



04 






r> 



o 



Pi 

.s 



u 

CO 



^ 04 



U 



S&^d 



fsl en 



,< 



O 



o 



I & 

2 o 
o c^ 



a. 



^ 



oO 

OT U 

o ^o 
00 U 



Id ^ J-' 



rH eN m 



X) 

c 
o 

(U 



^ 



c 
o 
•c 

<u 

"&. 

6 
o 

u 

d 
o 



a 
o 
•C 



03 



O 
u 

C 

o 



c 
o 

e 

o o 



c 
o 

■0 



O 



o 
o 

C 

o 



go 

is 



o 



c 



td 



Q 



Q 



a 



X) 

o 
<u 



U O 



S.s 



(} 



B 

e 



o 

u 

Oh. 

6 ■ 
o 



s 






r) 



in 
C 00 
O O 
C °0 

■0-^ 

°* 

o ^ 

s i 

a. 2 



w 



oi 



a 



C/5 



(PH-212) 



10 



to be advanced and rewound using commands from 
the terminal. 

Other commands from the terminal utilise 
the support processor as a simple editor. (A full 
list of these commands is given in Table 1.) For 
example, the contents of the scratchpad data 
memory can be examined and optionally changed 
by the 'D' command. To execute this command 
the user types a D followed by a data memory 
address and a carriage return ( ^ ). The con- 
tents of that location are displayed on the ter- 
minal together with a prompt (— ). The user 
can either insert new data or enter another CR 
which will display the contents of the incremen- 
ted data memory address. To quit the routine, 
an escape character (ESC) is entered. Such 
commands are of great use, for example in mani- 
pulating coefficients for a filter, or examining 
the results at an intermediate stage of execu- 
tion of the program. Extra commands are 
available for inserting, deleting or changing the 
machine code instructions, and individual in- 
structions can be examined in detail using a dis- 
assembler. This is necessary because the machine 
level instructions convey little meaning in their 
original form, even to an experienced program- 
mer. 

Other features of the support processor assist 
in debugging programs. A software trace routine 
permits single stepping through a program with 
the contents of registers and memory addresses 
displayed on the terminal at each step. This is 
possible because the microprocessor has access 
to the three major buses and the program counter 
within the high speed processor, the control of 
which can be relinquished to either of the pro- 
cessors. 

A simple graphics facility for plotting the 
contents of the scratchpad memory on a set of 
fixed axes is also incorporated. 

6.4.2. Timing generation 

A number of the user interface functions 
require the generation of timing pulses, for exam- 
ple, loading the random access microprogram 
memory, or driving the control Hnes of the paper 
tape reader. By originating all these timings 
in software a large amount of extra circuitry is 
avoided. The MPU also performs all the initialis- 
ation of registers, memory etc. on power-up and 
reset, so that the equipment is immediately avail- 
able to be used, and inserts a valid instruction in 
the COP AS program memory to ensure that hard- 
ware conflicts do not occur. 



BI/0 
bus 



DOP 
bus 



ADDR 
bus 






data in ADDR 

scratchpad 

data memory 

(1KX-16) 

data out 



5^ 



high-speed 

interface 

1 



Y 



<Z 



input 

port to 

MPU 






high-speed 

Interface 

2 






MPU 

system 

bus 



Fig. 4 - Dynamic data exchange schematic 



6.4.3. Dynamic data exchange (DDE) 

The support processor is also able to take 
an active role in the real-time audio processing. 
Clearly, it does not have the required speed to 
handle the digital audio, but it can assist in other 
ways. The principle is to have a shared area of 
memory — the scratchpad data memory in this 
case, which can be accessed both by the high 
speed processor and the support processor. Fig. 4 
shows in simplified form how this is achieved. 
The data and address (DOP and ADDR) buses 
which are normally under the control of the high 
speed processor are relinquished for one machine 
cycle of each sample period. The machine cycle 
used also has the purpose of synchronising the 
program of the high speed processor to the audio 
sampling pulses and so there is no time penalty 
in using it for other purposes. When data is to 
be transferred to or from the scratchpad, the 
support processor receives an interrupt which 
sets up data and addresses in the high speed ports 
in such a way that no timing conflicts can occur.* 
These bus arbitration routines are accessible to 
the programmer and are the key to simple com- 
munication between the HSP and the MPU. 



DDE has been filed as provisional patent No. 12295/78. 



(PH-212) 



11 



When the MPU sources data for the scratch- 
pad, the method of transfer is simple. The DOP 
and ADDR buses are sourced by the two high 
speed MPU interfaces which have been set up 
with the data in (1) and the address with write 
command in (2). However, for reading data 
from the scratchpad into the MPU a technique 
has been used which avoids the need for extra 
hardware. Data from the scratchpad is routed 
from the BI/0 bus via the bit-sHce, on to the 
DOP bus, using the same machine cycles as is used 
for synchronisation. Interface (1) is then repro- 
grammed as an input port to the MPU so that 
data can be transferred. Interface (2) is used 
for the address but this time with a read command. 
Due to the time taken to set up the data and 
addresses in the support processor, data trans- 
fers can take place at the rate of one 16 bit 
word per 100 us. This is sufficient however for 
a variety of applications. 

As an example of the power of DDE, con- 
sider the HSP programmed as a digital filter with 
the filter coefficients stored in the scratchpad. 
The support processor can have access to a library 
of coefficients stored in PROM or entered exter- 
nally from a terminal, and also monitor selector 
switches. The program in the support processor 
examines the switches, infers the filter charac- 
teristic required, extracts the coefficients required 
for that filter and writes them into the scratchpad 
while the program in the HSP continues to run 
uninterrupted. By these means, the high speed 
processor reserves all its processing power for the 
real-time audio processing. 

6.4.4. Array processing 

By definition, the amount of processing 
that can be performed on a continuous stream 
of real-time data is limited for a single piece of 
hardware. If the processing can be split into 
several tasks, however, each task can be executed 
by a separate, individually programmed processor. 
As a single card computer, COPAS lends itself 
well to being harnessed in large numbers and this 
task is simplified by its ease of interfacing. At 
this point we shall restrict the discussion to the 
part played by the support processor in this inter- 
facing scheme — its use for controlling the opera- 
tions in the HSP. 

In an array processor it will be necessary 
for an 'executive' computer to exert control over 
the individual processing elements in the array. 
This will invariably require data transfers between 
the HSP's and the executive. As we have seen, 
the support processor can handle the first part 



of this transfer and it remains to complete this 
transfer to the executive in an orderly fashion. 
This is conveniently performed by connecting the 
individual processors to a standard bus such as 
the IEEE 488 standard bus, known also as the 
General Purpose Interface Bus (GPIB).^^ Each 
support processor takes the role of a 'talker' or 
'listener' and the 'controller' resides in the execu- 
tive computer. The standard provides for the 
transfer of messages and data to addressed talkers 
and listeners all of which are connected in parallel 
on the 16 bit bus. Up to 14 talkers/listeners + 1 
controller may be connected in this way. 

6.4.5. Structure of the support processor 

The processor is built round an 8085 
microprocessor chip fitted with 4 Kbytes of pro- 
gram, and 0.5 Kbytes of random access memory. 
Additionally, it has high speed I/O ports for inter- 
facing with COPAS, an RS 232C serial port for a 
link with a terminal, a dedicated chip for inter- 
facing with the GPIB and a large number of I/O 
ports to connect to peripherals. The 8085 family 
of chips were chosen because of their economy 
of parts used; for example, the I/O ports are 
incorporated into the memory chips. 

6.5. Program sequencing 

The program sequencer is responsible for 
addressing the microinstructions in the order 
in which they are to be executed. At its simplest, 
this would take the form of a counter which 
increments on each clock cycle until the end of 
the program is reached. If no other control were 
available, the machine would only be able to 
execute a fixed pattern of microinstructions. 
COPAS uses a bipolar LSI sequencer chip - the 
Am2910, which can be used for conditional 
branching, up to five levels of nesting of sub- 
routines and has an independent loop counter. 
A block diagram of the sequencer chip is repro- 
duced from the manufacturer's literature in Fig. 5. 

During each microinstruction, the sequencer 
generates a 12 bit address from one of four sources 

1. the microprogram counter (|uPC) 

2. an external (direct) input (D) 

3. a register/counter (R) 

4. a five deep, last-in, first-out (LIFO) stack (F). 

Consider the requirement to perform a con- 
ditional branch, a basic requirement for a machine 



{PH-212) 



12 






RLD 

-» — 



l 



ii 



CP 



register/ 
counter 



< 



CC 

CCEN 



^D 



► / H 



I 



|3 

^ Q. 

c 



OE 




— N zero 
— Y pointer 



stack 
, »— h> pointer 



<><> 



i£ 



full 



> 5 word X 

i2 bit stock 
out 

in F 



17 



D R F /iPC 

multiplexer 



pusti/pop/hioid/ciear 



> microprogram 
counter- 
register ^PC 



3E 



Incrementer 



CI 



lb 




Fig. 5 - Am2910 sequencer block diagram 



to make decisions on the basis of previously 
obtained results. Fig, 3 shows how this can be 
accomplished. A test input to the sequencer 
indicates the success/failure of a test, and an 
operand field of the microinstruction is connected 
back to the direct inputs of the sequencer. In 
determining the address of the next microinstruc- 
tion, the sequencer will either connect the juPC 
if the test fails (giving an address one greater tlian 



the previous one), or will connect the D inputs if 
the test is passed, thus generating an address which 
resided in the operand field of the previous micro- 
instruction. In COP AS the tests can be selected 
from the results of logical and arithmetic com- 
parisons and tests for overflow, and carry-out bit. 
In addition, one test is arranged so that it always 
gives the same result, hence giving the ability to 
do unconditional branching. 



<PH-212) 



-13- 



3 

1 


§ £ g 1 i ° 

I e 8 a 15 


si' Jill 


§ 


gES=saRfiB 




0'-«uia««<ui 



s s 




(PH-212) 



_ 14 



Subroutines can also be executed using the 
push-down stack of the Am2910. When a jump 
to subroutine instruction is executed the next 
consecutive microinstruction address is 'pushed' 
on to the top ,of the LIFO stack and the address 
of the subroutine is supplied by the direct inputs. 
When the subroutine is completed a 'return from 
subroutine' instructions is given and the multi- 
plexer now^ selects the output of the stack. The 
stack is then 'popped', restoring the program 
sequence to the same point as it was before the 
execution of the subroutine. 

There are a total of 16 methods of deriving 
the next address in the Am2910 and 14 of these 
are available in COPAS. The reader is referred 
to the manufacturer's literature for more com- 
plete descriptions of these methods, and a list of 
those incorporated in COPAS are given in Table 
2. 

6.6. Input/output 

COPAS modules were designed to work in 
a totally synchronous environment. This greatly 
simplifies the input/output design so that, for 
example, a 16 bit register is used for each of the 
two inputs. Should extra inputs be required, a 
socket is available on the board complete with 
fully decoded control signals so that external 
devices can be connected. 

Since it is common for programs to gen- 
erate many outputs from the processing of a 
few inputs, the output ports are more sophisti- 
cated. These are implemented using dual-port 
memories which have the feature that any two 
words can be read from them simultaneously. 
Under microprogram control, output data may 
be written into any of the 16 memory locations, 
and under the control of a very simple external 
controller, these can then be multiplexed on to 
the single data output bus from the COPAS board. 
This approach has two significant advantages. 
Firstly, it is extremely economical on hardware 
while still providing a very flexible output system 
and secondly it facilitates the connection of 
COPAS units directly into a multiprocessor con- 
figuration. 

Again using a simple controller, the data-out 
bus and one of the data-in buses of each COPAS 
can all be connected to a jingle data highway so 
that audio data can be passed between COPAS 
modules. The controller has only to define the 
order in which the data transfers take place, since 
all system control functions are handled by the 
GPIB described in Section 6.4. 



6.7. Scratchpad memory 

This is the common area of read/write 
memory between the HSP and the MPU, size 
(IK X 16). Since the bit-sHces contain few 
internal registers, this memory is heavily used for 
storage of intermediate results, etc. As such it 
is important that memory accesses do not incur 
a time penalty by extending the duration of the 
machine cycle. For this reason, it is constructed 
using very high speed bipolar RAMs (with a worst- 
case access time of 35 ns) and this enables the 
bit-sHce/memory combination to work at a guaran- 
teed cycle time of 135 ns. This was highly desir- 
able because it was known that the multiplier 
chip which currently limits the cycle time of 
COPAS to 160 ns was under development to pro- 
duce higher speed versions and would be coni- 
patible in every other respect. Using the 16 bit 
addressing capability of COPAS the memory can 
be expanded to 64K words externally to provide 
time delays, for example, but the first IK words 
are always reserved for high speed scratchpad 
use and for DDE. 

6.8. Construction 

The completed COPAS module is shown in 
Fig. 6(a)* It is assembled on a single printed 
circuit board measuring 370 mm x 270 mm and 
contains 100 integrated circuits with a power 
consumption of 40W. About 40% of this circuitry 
is accounted for by the development aids neces- 
sary for the prototype and for assisting in debug- 
ging programs. The module contains all input/ 
output interfaces and a crystal oscillator from 
which all timing, including sampling rates, is de- 
rived. The module is therefore totally self-con- 
tained and can be used as a stand-alone system. 
Fig. 6(b), or with other modules to form a larger 
multiprocessor system. 

Hardware testing has been simplified by 
building sockets into the module which can be 
connected directly to a logic analyser. By these 
means 75 points in the processor can be monitored 
while the system is exercised using routines in the 
operating system. 

Fig. 7 shows COPAS as part of a system for 
signal processing experiments consisting of a high 
quality ADC and DAC, paper tape reader, digital 
cassette unit and terminal through which the entire 
system is controlled. 



* The printed circuit board layout was arranged by D.J. Marshall. 



{PH-212) 



15 



Fig. 6(a)-C()l'.\S ii/nJii/r 



S '.V 



I-.; 



' iS 



« 



t> 



if 









■;:::s:sai' 









Fig. 6(b) - COPAS as a stand alone computer 



(PH-212) 



16- 



Sp^te^e, 




■ ^i^i^N ^^ms .« ,, 




Ff^. 7 - COPAS experimental audio system 



1. Programming 



To program COPAS generally involves program- 
ming two computers, viz. the MPU and the HSP, 
in their respective languages. Fortunately, the 
firmware (software fixed in PROM) of the MPU 
includes routines which resolve the timing diffi- 
culties of dynamic data exchange. Therefore, 
what remains is effectively the task of program- 
ming each independently, with a knowledge of 
the fixed routines that are available. Details of 
these routines and of the operating system in 
general are given in Section 7.5. For further 
details of programming the 8085 the reader is 
referred to a wealth of literature produced by 
Intel on the subject. 

Up to this point, however, little has been 
said about programming the HSP in COPAS. 
The following sections present the parameters 
required for microprogram execution and discusses 
how these are used by a programmer to implement 
algorithms. An example is given to illustrate these 
principles. 

7.1. Instruction format 

The microinstruction register (pipeline 



register) of Fig. 3 contains all the control bits to 
implement all the control functions for all the con- 
trolled elements in COPAS. Although the number 
of control signals required is greater than the 48 
used, the number was reduced to 48 by a process 
of formatting of the microinstructions. Table 2 
shows the allocation of the bits, their control 
functions, and mnemonics associated with each 
group of bits or field. 

A simple example of formatting is given by 
the operand field. Here a group of 10 bits has 
been assigned five different functions according 
to the type of instruction which ia being executed. 
This field is usually reserved for providing an 
operand with the instruction for direct use in the 
arithmetic unit. However, it can also be used as a 
branch address in, for example, an instruction 
incorporating a conditional jump; or as a number 
to load into the loop counter when a particular 
group of microinstructions is to be repeated a 
number of times; or to address selected input/ 
output ports if the microinstruction includes an 
input data or output data statement; or to supply 
data directly to the AI port of the bit-slices. 
Clearly, only one of these features can be used 
during a single machine cycle but in practice this 
is a small penalty for the saving in microinstruction 



(PH-212) 



-17 



word length, i.e. if these features were coded 
separately the word length would increase from 48 
bits to 7 1 bits. This would represent a significant 
increase in the size of the microprogram memory. 

As can be seen from Table 2, the 4S bit 
microinstruction is divided into a number of 
separate fields, each of which control a particular 
hardware unit or group of units. Now consider 
the purpose of each of these control fields. 

(i) MULT field — these bits control the functions 
of the hardware multiplier, i.e. they initiate 
the loading of data as multiplicand/multiplier 
and enabling the product on to the BI/O bus 
of COPAS. One bit permits a double pre- 
cision product to be obtained. 

(ii) OP-CODE (1) — this collection of bits deter- 
mines the arithmetic/logical functions of the 
bit-slice. It also specifies the source and 
destinations of the operands and results; 
and it provides for left/right shifting of data. 

(iii) OP-CODE (2) — these bits attend to ancillary 
functions of the HSP such as the enabling of 
data onto the BI/O or DOP buses or the 
direct loading of the working register from 
the OPERAND field. It also permits the PC 
and MC used as scratchpad address registers to 
be auto-incremented. 

(iv) CARRY — these two bits are used in associa- 
tion with the ALU operations and the incre- 
menting of the PC and MC. 

(v) ADDR — this field selects between two modes 
of data memory addressing. Direct address- 
ing uses the OPERAND field as the source of 
addresses for the scratchpad memory. In- 
direct addressing uses either the PC or MC to 
point into the address space. 

(vi) POL and TEST SELECT - the previous ALU 
operation can be tested for various states, e.g. 
equality after a comparison of two numbers. 
These two fields select the test required, and 
the polarity of that test. This permits a con- 
ditional branch to be made on either the pass- 
ing or failure of a test. 

(vii) I/O, MEM — these two fields control the read- 
ing of data from input ports or memory and 
the writing of data into output ports or mem- 
ory. Note that some combinations of these 
are permissible simultaneously; for example, 
an indirect memory read can be combined 
with an output to data port instruction. 



(viii)NEXT ADDR - this field is interpreted by 
the sequencer to provide 14 next address 
choices. 

(ix) OPERAND - a 10 bit number can be used 
here appropriate to the other fields of the 
microinstruction, i.e. as data, memory add- 
ress, branch address, I/O port address or loop 
count. 

7.2. Mnemonic description 

The task of generating a single microinstruc- 
tion involves the orderly determination of each bit 
of a microinstruction word. This would represent 
a tedious and error-prone procedure if this were to 
be accomplished at the machine code level. There- 
fore, each combination of bits in a field that has 
a meaningful operation in the complete machine 
is assigned a mnemonic, and so each microinstruc- 
tion can be built up as a sequence of mnemonics. 
In turn, these mnemonics can be recognised by 
an assembler and translated into the appropriate 
machine code automatically. In each group of 
mnemonics, there is one which has been defined 
as a default condition — in other words, if that 
particular field is not explicitly specified then the 
assembler should automatically insert a code 
corresponding to a condition which is designed 
to have no effect, i.e. a no-operation or NOP. At 
this stage of development, no restriction has been 
imposed on the instruction set with the inevitable 
result that several mnemonics are needed to gener- 
ate a single microinstruction. However, future 
versions of the assembler will have a more refined 
instruction set which will in turn simplify the 
process of microprogramming. For example, 
an unconditional jump could be easily described 
with a single mnemonic, whereas in the version 
described here three mnemonics are needed. 

7.3. Tile cross-assembler 

An assembler program, or more correctly, 
a meta-assembler, has been written which recog- 
nises the mnemonics of Table 2 and converts them 
into machine code. The program is written in 
Super-Fortran which is available on the Tymshare 
time-shared computer network. The program 
allows the user to specify operands and memory 
addresses symbolically and use alpha-numeric 
labels to mark program statements. It allocates 
storage as required and checks for inconsistencies 
in syntax etc., and finally produces the required 
machine code punched on paper tape, formatted 
so that it can be entered directly into COPAS 
program and data memories. A second output, 
which is of great importance, is the microprogram 



(PH-212) 



GPIB 



Fig. 8 - Software map 






debug 



single 
step 



power on 



r' 

initialise system 



.__ — . _ self test 



command 
interpreter 




plot and 
user routine 



edit 



cassette and 
paper tape holder 



bus arbitration 



A 

1. disassembler 

2. insert 

program 

3. ctiange >- and data 
memories 

4. list 



-1. read PT 

2. read cassette 

3. locate program 

4. rewind and advance 

5. write on cassette 



listing which can be fully annotated with com- 
ments. For full details of the techniques used 
in the assembler and the method of use, the 
reader is referred to a companion Report.'^'* 
The assembly of a simple program is discussed in 
Section 7.6. 

7.4. Program locking 

There is one part of every program written 
for COP AS that will be the same if that program 
is to run in real-time locked to the sampling 
pulses. The COPAS hardware provides the samp- 
ling pulse itself as one of the selectable test signals 
to the sequencer. At the end of each micro- 
program, a set of three instructions synchronise 
the start of execution of the microprogram with 
the sampling pulse. This will be covered in greater 
detail in Section 7.6 in which an example COPAS 
program is discussed. 

7.5. Operating system description 

An overview of the firmware built into the 
support processor is given in Fig. 8. On turning 
on, the first task of the support processor is to 
initialise the system. This is done in two stages. 
Firstly, the support processor itself must be 
initialised and those ports shared with the HSP 



set up as inputs. Valid data must be transmitted 
to the peripherals to prevent spurious activity. 
Then the high speed processor must be initiahsed. 
The main problem here is that the microprogram 
memory is constructed from read/write memory 
elements and it will contain random data on 
power-up. It is therefore necessary to insert a 
valid instruction in the memory which prevents 
hardware conflicts when it is executed. When 
this procedure is complete, a sign-on message is 
transmitted to the terminal. Fig. 8 shows that a 
self-test routine could be executed at this time 
and although this is not included in the firmware 
at present, the MPU has access to all the data 
and address Hnes in the HSP to make this test 
very comprehensive, if required. 

The next stage in the firmware is the com- 
mand interpreter. Commands can either come 
from the terminal or via the GPIB interface, and 
as a result of a command, one of the indicated 
functions will be executed. These routines have 
been described in Section 6.4 except for the user 
routines. This is the 'S' command of Table 1, 
which transfers control to programs written 
by the user that have been loaded into read/ 
write memory of the support processor. This 
gives the user the ability to test programs simply 
before committing them to PROM. 



(PH-212) 



19 



ASSEMBLY OF 


BIQUAD 


07/23 12:07 1979 










DATA DECLARATION 


SYMBOL 


OP. 


ARGUMENT 


CELL 


SNDIN 
SNDOUT 


EQU 
EQU 


£000 
£007 






Bl 
B2 
CI 
C2 
AO 


DC 
DC 
DC 
DC 
DC 


f7FA7 
£80AU 
£C02C 
£3E90 
£H11D 





1 
2 
3 

14 


STl 
ST2 


DS 
DS 


1 
1 




5 
6 


TEMP 


DS 


1 




7 


END 











PAGE 1 



ASSEMBLY OF BIQUAD 07/23 12:07 1979 

MAIN PROGRAM 



PAGE 



LINE 


ADDR 


MNEMONICS 




HEX. CODE 








: RESET NOP CJP TST2 F : RESET 




20FE 


F02E 


3000 


1 


1 


ADD L L NC WR 




3C86 


F90E 


EOOO 


2 


2 


NOP DOWR WRT STl 




20FC 


F101 


E005 


3 


3 


NOP DOWR WRT ST2 




20FC 


F101 


E006 


11 


t 


:BEGIN INX NOP IN SNDIN 




AOFE 


F106 


EOOO 


5 


5 


ADD Bl L NC WR RD STl 




3116 


F900 


E005 


6 


6 


OTZM ADD Bl WR NC WR 




E6l)6 


F90E 


EOOO 


7 


7 


INX ADD WR L NC SIG DOWR BOUT 


OUT SNDOUT 


BtCC 


790A 


E007 


8 


8 


INY NOP RD Bl 




60FE 


FlOO 


EOOO 


9 


9 


NOP DOWR WRT TEMP 




20FC 


F101 


E007 


10 


A 


OTZM ADD Bl L NC XWR 




F166 


F90E 


EOOO 


1 1 


B 


INX NOP IN SNDIN 




AOFE 


F106 


EOOO 


12 


C 


INY NOP RD CI 




60FE 


FlOO 


E002 


13 


D 


INY NOP RD C2 




60FE 


F100 


E003 


in 


E 


OTZM ADS LSA Bl XWR NC XWR 




EBDE 


F90E 


EOOO 


15 


F 


ADD Bl XWR NC XWR RD ST2 




2566 


F900 


E006 


16 


10 


NOP DOXWR WRT STl 




20FA 


F101 


E005 


17 


11 


OTZM ADD Bl L NC WR 




F116 


F90E 


EOOO 


18 


12 


INX NOP RD TEMP 




AOFE 


FlOO 


E007 


19 


13 


INY NOP RD B2 




60FE 


FlOO 


EOOl 


20 


11 


INY NOP RD AO 




60FE 


FlOO 


E001 


21 


15 


OTZM ADD Bl L NC WR 




F1416 


F90E 


EOOO 


22 


16 


NOP DOWR WRT ST2 




20FC 


F101 


E006 


23 


17 


NOP PUSH 




20FE 


F10E 


14000 


21t 


18 


NOP LOOP PSP T 




20FE 


F10E 


DOOO 


25 


19 


ADD Bl L NC SIG DOSIG CJP TF 


T :BEGIN 


5118 


F91E 


30011 



END 



Fig. 9 - Listing produced by the assembler for the program example 



7.6. Program example 

Fig. 9 is the listing of a sliort program to 
illustrate the features of a COPAS program 
developed using the cross-assembler. The program 
executes a single biquadratic section filter whose 
transfer function using z plane notation^ is, 



H{z) 



,^o+C,z-»+C,z- 



1 +BjZ 



-1 +B^z-'' 



This transfer function can be realised using 
the network of adders, multipliers and delays 
shown in Fig. 10. The listing consists of two dis- 
tinct sections; a data declaration area in which 
all symbols used in the program are defined and 



memory space reserved; and the program state- 
ment area which contains the instructions which 
are executable by a COPAS processor. 

7.6.1. Data declaration area 



The format of this series of commands 



IS 



SYMBOL OPERATOR ARGUMENT 

and three kinds of operator are recognized by the 
assembler at this stage. 

EQU — An equivalence operator used to 
assign a constant value to a user symbol for sub- 
sequent use in literal commands. The value of 



(PH-212) 



20- 



. SNDIN ) 




+ 


(TEMP) 


(SNDOUT) 




■^-\X)-^ 








Ao 


z-^ 


CSTI) 






XWR 


+ 






1 


->-€>^ 


+ 


^K)^ 


I 




'' V '^ 






C^ 


z-1 


-5^ 






WRpA-i 






u<x>q^ 


+ 


h^<X>^ 





f f 



^2 



-£?2 



Fig. 10 - Biquadratic filter section used for the 
program example 



the constant is set to the argument of the com- 
mand. No location in data memory is involved 
since the constant forms the argument of a micro- 
instruction in program memory. In this example, 
the input and output ports are symbolically repre- 
sented by SNDIN, SNDOUT. 

DC — This operator names a single location 
in data memory for the symbol and loads a con- 
stant equal to the argument of the command into 
the reserved location. In the example, hexa- 
decimal values (denoted by £) are assigned to the 
five filter coefficients, B^ , B^, C^, C^, A^. 

DS — This operator defines space in the data 
memory. The amount of space reserved is defined 
by the argument, the first location of which is 
assigned to the symbol. In the example, space is 
reserved to the two digital delays of the filter STl, 
ST2 and for a temporary store used to store inter- 
mediate results, TEMP. 

The data declaration area is always termi- 
nated by an END command. 

7.6.2. Program statement area 

The format for these commands is 



OP-CODE 1 — This is a non-optional mnem- 
onic which specifies the CPU operation. If no 
operation is required a NOP must be inserted 
which sets each of the fields of the microinstruc- 
tion to a default state. A list of possible 
mnemonics are given in Table 2, most of which 
require additional mnemonics (A,B,C,D) to com- 
pletely specify the operation. For example an 
ADD operation requires two sources A and B, a 
carry bit C, and a destination D to be specified. 
The mnemonics (A,B,C,D) must be supplied 
according to the requirements of OP-CODE 1 and 
greater detail on these matters can be obtained 
from the manufacturer's data. 

OP-CODE2 — This is an optional mnemonic 
controlling ancillary functions of the CPU, for 
example, controlling the operation of the memory 
address registers. Table 2 gives a complete list 
of mnemonics that can be substituted at this 
point. 

OP 3, 0P4, etc. — These are additional 
optional mnemonics which are used to specify 
the remaining fields of the microinstruction; 
the input/output, memory addressing mode, 
next address, etc. 

OPERAND — This can be a numeric value, 
a symbolic name, a statement label or an input/ 
output port address according to the requirements 
of the previous mnemonics used. 

The program statement area is terminated by 
an END command. 

Now consider the program statements of the 
example. These can be conveniently divided into 
four areas — 

Line — the reset code 

Lines 1—3 — initialisation 

Lines 4—22 — execution of the biquadratic 

algorithm 
Lines 23—25 — program locking. 

Fig. 10 shows how the registers, stores and 
ports are used in this example. 

7.6.3. The reset code 



:LABEL OP-CODEl A B C D OP-CODE2 OP3 

0P4 OPERAND 

: LABEL — This is an optional alphanumeric 
statement identifier of up to six characters and is 
always preceded by a : 



The reset code is a single microinstruc- 
tion loop at line (see Fig. 11). A test is per- 
formed using TST2 and if the test fails the CJP 
mnemonic results in a jump to the address held 
in the operand field of the pipeline register. Since 
this field contains the numeric substitution for 
the label :RESET, the instruction jumps to itself. 



(PH-212) 



21 




Fig. 1 1 - Program sequence for the program 
example 

This situation will prevail until the test is passed, 
when the //PC will increment to the next instruc- 
tion. The TST2 test is connected to a front 
panel 'START' button and so this provides the 
user with the ability to initiate the algorithm 
which follows. Additionally, when a hardware 
of software 'RESET' occurs, the processor latches 
into this controlled state. 

7.6.4. Initialisation 

Most programs require some kind of 
initialisation. These are program steps that are 
not necessarily carried out for each execution 
of an algorithm. At line 1, a zero is 'added' 
into the working register (WR) and in lines 2,3 
the contents of the working register are written 
into the data memory locations represented 
by ST1,ST2. From the data declaration area 
it can be seen that ST1,ST2 were allocated to 
cells 5 and 6 of the data memory and that this 
substitution has been made in the operand field 
of the hexadecimal code produced by the assem- 
bler. Now that the two stores of the filter have 
been set to zero, the filter program can be started 
with knovm initial conditions. 

7.6.5. Execution of the biquadratic algorithm 

The operations on lines 4—22 are mainly 
arithmetic and 'move' operations and these can 
best be represented using arithmetic symbols and 
arrows (->) to indicate their action. Where this 
method is not sufficient to indicate the action 
of a microinstruction a comment is inserted in 
the following table. 



7.6.6. Program locking code 

Lines 23—25 lock the software to the 
hardware by using the sampHng pulses as a test 
input. At line 23 a PUSH instruction puts the 
address of the current instruction plus one on 
the top of the push down stack in the sequencer. 
At line 24 a LOOP instruction examines the 
sampling pulse (PSP) and while it is true (T) 
executes the microinstruction whose address 
was pushed on the stack during the previous 
microinstruction (see Fig. 11). This corresponds 
to the LOOP instruction itself and so a single 
microinstruction loop has been set up from which 
escape occurs only when the sampHng pulse is 
'false' and the test fails. When the test fails 
the next sequential microinstruction at line 25 
is executed and this is an unconditional jump 
to label : BEGIN. The assembler has associated 
an absolute address value to this label ( 004) and 
so the program will restart at line 4. An addit- 
ional characteristic of this microinstruction is 
that it is used for DDE. By calculating, 

BI + L + NC -^ SIG 

the B bus of the bit-slice is routed to the internal 
2 bus (Fig. 2). For this instruction 0P-C0DE2 is 
DOSIG which transmits the 2 bus to the OOP 
port and so to the data memory. Thus a route 
has been set up which allows data read from the 
data memory to be passed via the B port of the 
bit-slice and the 2 bus, to the DOP port of the 
bit-slice and then to the high speed data port 
of the microprocessor support system. 



8. Discussion of performance and applications 

The prototype COPAS module operates on 
signals sampled at 32 kSPS with 16 bit accuracy. 
The machine cycle time is 162.7 ns giving 192 
instructions in the interval between sampling 
pulses. It has 2 separate parallel input ports 
and a single output port which gives access to 16 
registers which can be addressed externally. 

Several test programs have been written to 
confirm the correct operation of the processor. 
For example, a biquadratic section digital filter 
program executed using 19 microinstructions, 
and a digital sine-wave oscillator program executed 
in 12 instructions were written. This indicates 
a factor of 5 improvement over the Miproc based 
system. Additionally, the MPU was programmed 
to select coefficients for the filter, from a library 
of coefficients held in the MPU memory, accord- 
ing to a command given from the terminal. These 



(PH-212) 






TABLE 3 
Analysis of program example 



LINE 



5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 



OPERATION 



SNDIN ~> INX 

[STl] +0 + NC->WR 

AO * SNDIN + WR-^WR 

WR -^ SNDOUT and INX 

B1->INY 

WR -* [TEMP] 

Bl * SNDOUT -> XWR 

SNDIN -> INX 

CI -> INY 

C2 -^ INY 

2 (CI * SNDIN + XWR) -> XWR 

[ST2] + XWR ^ XWR 

XWR-^ [STl] 

C2 * SNDIN + O + NC -> WR 

[TEMP] -^INX 

B2 ^ INY 

AO -» INY 

B2 * TEMP + WR + NC -> WR 

WR -> [ST2] 



COMMENT 



audio sample into X register of multiplier 
contents of address STl into WR 

output sample and send to multiplier via 2 bus 

coefficient Bl into multiplier 

store output sample temporarily 

move product to extended working register 

reload multiplier from input port 



arithmetic left shift multiplier by two 
add in contents of address ST2 

move product to working register 



load AO ready for next run 



coefficients were transferred into the scratchpad 
by DDE and so no interruption to the real time 
processing occurred when filter characteristics 
were changed. Similarly, the MPU was program- 
med so that the digital oscillator could produce 
sequences of musical notes automatically, again 
using DDE. 

This level of performance was entirely satis- 
factory for the intended applications. Studies 
have shown that, for example, a 64 stage trans- 
versal filter can be realised at a 32 kHz sample 



rate, or third octave filtering for a 24 band spec- 
trum analyser at a 16 kHz sampling rate. A host 
of special audio effects, such as phasing, pitch 
changing, a chorus effects could be easily imple- 
mented. The generation of precise test material 
such as white noise, pink noise, and warble tone, 
and even elementary music synthesis could be 
done. Additionally, companding systems, com- 
pressors and limiters, program level monitoring 
systems, vocoders, reverberation time measure- 
ments, and loudspeaker quality assessment could 
be adequately managed by a single COP AS module. 



(PH-212) 



23 



Linked with bulk storage this list could be 
extended to include artificial reverberation, echo, 
and other effects. 



9. Conclusions 

COP AS is the result of research into a low cost, 
high speed LSI signal processor. Its primary pur- 
pose is as a tool for further research in audio pro- 
cessing that can be used on its own in a dedicated 
system or in large numbers where the computa- 
tional load is great. The design that evolved meets 
these requirements and points the way to even 
more powerful processors. By separating the high 
speed data processing from the slower functions 
required, a versatile support processor was created 
which provides the user with comprehensive pro- 
gram development support. A cross-assembler 
has been written which recognises the mnemonics 
created for COPAS and generates listings and pro- 
gramming tapes directly. 

Test programs have been successfully run on 
the machine and thus it was confirmed that the 
performance meets its design objectives. 



10. References 

1. RABINER, L.R. 1972. Terminology in 
digital signal processing. IEEE Trans. Audio 
and Electroacoustics, Vol. AU-20, Dec. 1972, 
pp. 332- 337. 

2. McNALLY, G.W. 1979. A computer based 
mixing and filtering system for digital sound 
signals. BBC Research Department Report 
No. 1979/4. 

3. McNALLY, G.W. 1978. The use of a pro- 
grammed computer to perform real-time 
companding of high quality sound signals. 
BBC Research Department Report No. 
1978/14. 

4. GOLD, B., LEBOW, I.L., McHUGH, P.G. 
and RADER, CM. 1971. The FDP - a 
fast programmable signal processor. IEEE 
Trans. Comput., Vol. C-20, Jan. 1971, 
pp. 33 — 38. 

5. MICK, J.R. and BRICK, J. 1977. Micro- 
programming a bipolar microprocessor. 
Microprogramming Handbook. Advanced 
Micro Devices, 1977. 

6. Also, H., TOKORO, M., UCHIDA, S., MORI, 



H., K NEKO, N. and SHIMADA, M. 1974. 
A very high speed microprogrammable pipe- 
line signal processor. Proc. IFIP, Congr. 
1974, Aug. 1974, pp. 60 - 64. 

7. GOLD, B. 1974. Parallel and sequential 
trade-offs in signal processing computers. 
Nat Telecommun. Conf. Rec, Dec. 1974, 
pp.491 -495. 

8. HARSHMAN, J.V. 1974. Architecture of 
a programmable signal processor. Nat. Tele- 
commun. Conf. Rec, Dec. 1974, pp. 496 — 
500. 

9. WILKES, M.V. 1951. The best way to 
design an automatic calculation machine. 
Manchester University Computer Inaugural 
Conf., 16, 1951. 

10. BOULAYE, C.C. Microprogramming, 1975. 
Macmillan Press, London. 

11. HUSSON, S.S. Microprogramming principles 
and practices. Prentice-Hall, Englewood 
Cliffs, N.J., 1970. 

12. WU, Y.S. 1972. Architectural considera- 
tions of a signal processor under micro- 
program control. AFIPS Conf. Proc, Vol. 
40, May 1972, pp. 675 - 683. 

13. WEINSTEIN, C.J. and OPPENHEIM, A.V. 
1969. A comparison of round-off noise in 
floating point and fixed point digital filter 
realisations. Proc. IEEE, Vol. 57, June 
1969, pp. 1181 - 1183. 

14. OPPENHEIM, A.V. and WEINSTEIN, C.J. 
1972. Effects of finite register length in 
digital filtering and fast Fourier transform. 
Proc. IEEE, Vol. 60, Aug. 1972, pp. 957 - 
976. 

15. CLAASEN, T.A., MECKLENBRAUKER, 
W.F.G. and PEEK, J.B.H. 1976. Effects of 
quantisation and overflow in recursive digital 
filters. IEEE Trans, on Acoustics, Speech 
and Signal Processing, Vol. ASSP-24, No. 6, 
Dec. 1976. 

16. ALLEN, J. 1975. Computer architecture 
for signal processing. Proc. IEEE, Vol. 63, 
Apr. 1975, pp. 624-633. 

17. ARNOTT, R.D. and MARKEL, J.D. 1978. 
Fortran control of real-time signal processing 
with high speed processors. IEEE Trans, on 



(PH-212) 



-24 



Acoustics, Speech and Signal Processing, 
Vol. ASSP-26, No. 4, Aug. 1978, pp. 278 - 
284. 

18. ALI, Z.M. 1978. A high speed FFT pro- 
cessor. IEEE Trans, on Communications, 
COM-26, No. 5, May 1978, pp. 690 - 696. 

19. BLENKENSHIP, P.E. 1975. LDVT : High 
performance mini computer for real-time 
speech processing. EASCON Conf., Washing- 
ton, Sept. 1975. 

20. BLENKENSHIP, P.E., HUNTOON, A.H. and 
SFERRINO, V.J. 1974. LSP/2 programm- 
able signal processor. Proc. Nat. Electron. 
Conf., Vol. XXIX, Oct. 1974, pp. 416 - 
421. 



21. STEINGART, D. and ZAKS, R. 
1976, SYBEX. 



Bit-slice. 



22. HOFSTETTER, E.M., TIERNEY, J. and 
WHEELER, O. 1977. Microprocessor 
reaUsation of a linear predictive vocoder. 
IEEE Trans, on Acoustics, Speech and Signal 
Processing, Vol. ASSP-25, No. 5, Oct. 1977, 
pp. 279 - 387. 

23. LOUGHRY, D.C. and ALLEN, M.S. 1978. 
IEEE Standard 488 and microprocessor 
synergism. Proc. IEEE, Vol. 66, No. 2, 
Feb. 1978, pp. 162 - 171. 

24. BELLIS, F.A. 1979. An assembler for 
COPAS microprogrammers. BBC Research 
Department Report in course of preparation. 



SMW/SB 

(PH-212) 



25 



Printed by BBC RESEARCH DEPARTMENT, Kingswood Warren, Tadworth, Surrey, KT20 6NP 



